Parametrizing Compton form factors with neural networks* 
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Abstract 

We describe a method, based on neural networks, 
of revealing Compton form factors in the deeply 
virtual region. We compare this approach to stan- 
dard least-squares model fitting both for a simpli- 
fied toy case and for HERMES data. 



1 Introduction 

Extraction of generalized parton distribution 
(GPD) functions [|l]-|3|] from exclusive scattering 
data is an important endeavour, related to such 
practical questions as the partonic decomposi- 
tion of the nucleon spin [Q] and characterization 
of multiple-hard reactions in proton-proton col- 
lisions at LHC collider [§, ^. To reveal the 
shape of GPDs, one employs global or local fits 
to data [f7|-|l3|]. However, compared to famil- 
iar global parton distribution (PDF) fits, fitting 
of GPDs is intricate due to their dependence on 
three kinematical variables (at fixed input scale 
Qq), and the fact that they cannot be fully con- 
strained even by ideal data. Thus, final results 
can be significantly influenced by the choice of 
the particular fitting ansatz. To deal with this 
source of theoretical uncertainties, we used an al- 



ternative approach [14], in which neural networks 



are used in place of specific models. This ap- 
proach has already been successfully applied to 
extraction of the deeply inelastic scattering (DIS) 
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Structure function F2 and normal PDFs [|15|-|17|]. 
We expect that the power of this approach is 
even larger in the case of GPDs. In the light of 
the scarce experimental data, in this pilot study 
we attempted the mathematically simpler extrac- 
tion of form factor 'H{xB:t) of deeply virtual 
Compton scattering (DVCS). We used data from 
the kinematical region where this Compton form 
factor (CFF) dominates the observables and de- 
pends essentially only on two kinematical vari- 
ables: Bjorken's scaling variable xb and proton 
momentum transfer squai^ed t. These simplifica- 
tions make the whole problem more tractable. 

2 The method 

Neural networks were invented some decades ago 
in an attempt to create computer algorithms that 
would be able to classify (i.e. recognize) complex 
patterns. The specific neural network type used 
in this work, known as multilayer perceptron, is 
a mathematical structure consisting of a number 
of interconnected "neurons" organized in several 
layers. It is schematically shown in Fig. [l], where 
each blob symbolizes a single neuron. Each neu- 
ron has several inputs and one output. The value 
at the output is given as a function /(X^^ ^j^j) of 
a sum of input values xi, X2, • • • , each weighted 
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Figure 1 : The structure of a neural network that represents a set of CFFs {■H(xb, t), £{xB,t), . . .}. The 
network is trained by calculating observables (cross-sections (j{xB,t) or asymmetries) from CFFs, com- 
paring them to experimentally measured values, and then by adjusting network parameters to minimize 
the squared errors. 



by a certain number wj . 

The parameters of a neural network (weights 
Wj) are adjusted by a procedure known as "train- 
ing" or "learning". Thereby, the input part of a 
chosen set of training input-output patterns is pre- 
sented to the input layer and propagated through 
the network to the output layer. The output values 
are then compared to known values of the output 
part of training patterns and the calculated differ- 
ences are used to adjust the network weights. This 
procedure is repeated until the network can cor- 
rectly classify all (or most of all) input patterns. If 
this is done properly, the trained neural network is 
capable of generalization, i.e., it can successfully 
classify patterns it has never seen before. 

This whole paradigm can be applied also to fit- 
ting of functions to data. Here, measured data 
are the patterns, the input are the values of the 
kinematical variables the observable in question 
depends upon, and the output is the value of this 
observable, see Fig. |l|. In this case, the general- 
ization property of neural networks represents its 
ability to provide a reasonable estimate of the ac- 
tual underlying physical law. For the particular 
application of neural networks to fits of hadron 
structure functions we refer the reader to papers 
of the NNPDF group [15-18]. Our approach is 



similar and is described in detail in [14, T^] 

To propagate experimental uncertainties into 
the final result, we use the "Monte Carlo" method 
[^, where neural networks are not trained on ac- 
tual data but on a collection of "replica data sets". 
These sets are obtained from original data by 



generating random artificial data points according 
to Gaussian probability distribution with a width 
defined by the error bar of experimental mea- 
surements. Taking a large number Nrep of such 
replicas, the resulting collection of trained neu- 
ral networks 'H^^\ . . . ,'H(^'-<=p) defines a proba- 
bility distribution V[T-L\ of the represented CFF 
T-L{xB,t) and of any functional /"["H] thereof. 
Thus, the mean value of such a functional and its 
variance are [E 
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vn v[H] F[n] 



(1) 
(2) 



3 Toy example 



To illustrate the neural network fitting method, we 
shall now present a toy example where we will 
extract a known function of one variable by fitting 
to fake data. First we define some simple target 
function f{x) as a random composition of simple 
polynomial and logarithm functions constrained 
by the property 



/(I) = . 



(3) 



This function is plotted in Fig. g as a thick dashed 
line and labeled as "target". 

Next, npts=10 fake data points {xi,yi ± Ay^) 
are generated equidistantly in x. Their mean val- 
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Figure 2: Toy examples of fitting to fake data, generated from the underlying target function (dashed). 
The first panel shows the result of a standard least-squares model fit, the second one shows twelve neural 
networks that are trained on the Monte Carlo replicas of fake data, and the third panel shows the uncer- 
tainty band obtained by statistical averaging of neural networks (displayed in the second panel). 



ues Ui are smeared around target values by ran- 
dom Gaussian fluctuations with standard devia- 
tion A?/j=0.05, which is also taken to be the un- 
certainty of generated points. These fake data are 
then used for fits, first using the standard least- 
squares method with a two-parameter model 

/(x) = , (4) 

and, second, utilizing the neural network method. 
Note that the Monte Carlo method of eiTor prop- 
agation, which we use together with neural net- 
work fitting, itself requires to generate artificial 
data sets. Thus, we generated Nrep-^'^ replicas 
from original fake data and used them to train 
12 neural networks that represent 12 functions, 
plotted as thin solid lines on the second panel of 
Fig. These functions define a probability dis- 
tribution in the space of functions f{x) which, 
according to Eqs. (|l|^), provides an estimate of 
the sought function /(x), together with its uncer- 
tainty. This estimate is shown on the right panel 
of Fig. § as a (red) band with ascending hatches. 
The corresponding model fit result, obtained by 
the standard method of least-squares optimization 
and error propagation using the Hessian matrix, is 
shown in the left panel of Fig. ^ as a (green) band 
with descending hatches. 

We have deliberately chosen the ansatz (^) with 
two properties, incorporating theoretical biases 



about endpoints: /(I) = and /(O) = 0. The 
first of these actually "corresponds to the truth", 
i.e., to Eq. (^), whereas the second one is erro- 
neous. As a result, for x — )• 1 the model fit is 
in much better agreement with the target function 
(thick dashed line) than neural networks, which 
rely only on data and are insensitive to this end- 
point behaviour. On the other side, for x — > the 
model fit is in some small disagreement with the 
target function, and, what is much worse, it very 
much underestimates the uncertainty of the fitted 
function there (the uncertainty becomes zero at 
endpoints!), demonstrating the dangers of unwar- 
ranted theoretical prejudices. 

We can be more quantitative and say that ac- 
cording to the standard measure, 

both methods lead to functions that correctly de- 
scribe dataQ: 

Xmodel/f^pts = 11.9/10 ; 
xLur.net/^pts = 12.3/10 . 

'We ignore here the difference between the number of 
data points Wpts and the degrees of freedom — neural net- 
works have very many free parameters and for them degrees 
of freedom is not such an important characteristic as in the 
case of standard model fits. 
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We can now further ask to what extent the two 
methods extract the underlying target function 
f{x). Naturally, we can measure this by a kind 
of criterion 



X 



E 



A/(x,)2 



where the denominator is now the propagated un- 
certainty Af{X'i) rather than the experimental one 
Ayi. In our toy example we get 



Xmodci/^pts = 25.6/10 ; 



nour.nct 



/n 



pts 



.4/10 



showing that the model fit underestimates its un- 
certainties, while neural networks are much more 
realistic. 

This example shows that the neural network 
method has a clear advantage if we want bias- 
free propagation of information from experimen- 
tal measurements into the CFFs. Still, if we want 
to use some additional input, e.g., if we rely on 
the spectral property (|3|), we can do so also within 
the neural network method. For example, we 
could take the output of neural networks in this 
toy example not as an representation of the func- 
tion f{x) itself, but as representing f{x) / (1— x)^, 
with some positive power p. Then the final neu- 
ral network predictions for /(.x) would also be 
constrained by Eq. (||), without any further loss 
of generality (in practice it turns out that the de- 
pendence of the results on the choice of power p 
is small). Various methods of implementing the- 
oretical constraints in the neural network fitting 



method are discussed in Sect. 5.2.4 of 1 18] 



4 Application to HERMES data 



To extract the CFF T-L from asymmetries [|21|], 
measured by the HERMES collaboration in pho- 
ton electroproduction off unpolarized protons, 
we applied the described neural network fitting 
method in [14]. We used 36 data points: 18 



measurements of the first sine harmonic A^'^*^ of 
the beam spin asymmetry, and 18 measurements 
of the first cosine harmonic A'^^^'^ of the beam 



charge asymmetry. As for the toy model from 
the previous section, we compare the results with 
the standard least-squares model fit. Let us first 
shortly describe this model fit of For the par- 
tonic decomposition of the imaginary part Dim V. 



we used a model, presented in [ 10]: 



Jm'H(xBj, t) = vr 



Here, H^{$^, ^, t) are GPDs along the cross-over 
trajectory ^ = x, parameterized as: 



H{x, X, t) 



n r 



2x 



1 + X \1 + x 



1 — X 



1 + x 



1 



l~x t 

i+xTP 



p ■ 



The parameters of H^'^^ were fixed by separate 
fits [|l^] to collider data, and some parameters of 
^vai y^ej-g ajso fixed using information from DIS 
data and Regge trajectories a{t). The real part 
IHe T-L is expressed in terms of the imaginary one 
via a dispersion integral [|^, ^ 24] and the 
subtraction constant C, leaving us finally with a 
model that possesses four parameters: r™', 6™', 
M™^ and C. This model is fitted to experimen- 
tal data, resulting in parameter values, which can 
be found in Q], and shapes of 3m T-L and dleU 
that are plotted on Fig. ^ as (green) bands with 
descending hatches. 

The neural network fit was performed by creat- 
ing 50 neural networks with two neurons in the in- 
put layer (corresponding to kinematical variables 
xb and t), 13 neurons in the hidden middle layer, 
and two neurons in the output layer (con^espond- 
ing to 3m T-L and D^zT-L), cf. Fig. [l]. These were 
trained on Nrep=50 Monte Carlo replicas of HER- 
MES data. We checked that the resulting CFF V. 
does not depend significantly on the precise num- 
ber of neurons in the hidden layer. The results are 
also presented on Fig. ^ where we show the neu- 
ral network representation of 3m T-L and ^Hc H as 
(red) bands with ascending hatches. 

Comparing the two approaches, one notices 
that in the kinematic region of experimental data 
(roughly the middle-x b parts of Fig. |3| panels) 
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Figure 3: Neural network extraction of 3mH{xBj,t) and 9^c?^(a;Bj,i) (ascending hatches, red) from 
HERMES data [|T]| compared with model fits (descending hatches, green) for two different values of 
momentum transfer squared t. 



neural network and model fit results coincide, i.e., 
eiTor bands are of similar width and they over- 
lap consistently. However, outside of this data re- 
gion, we see that the predictions of the two ap- 
proaches can be different. There the uncertainty 
of the model fit is in general smaller, and we ob- 
serve a strong disagreement in the low x b region, 
reflecting the theoretical bias of the chosen model 
that possesses a x~"'^*) Regge behavior. The les- 
son learned from the toy model example is that, 
even if we believe in Regge behaviour for small 
xb, we should still consider the uncertainty from 
the neural network method as more realistic. 



5 Conclusion 



information from data. Comparisons with stan- 
dai^d least-squares model fits reveal that the un- 
certainties, obtained from neural network fits, are 
reliable and realistic. 

Relying on the hypothesis of T-L dominance, 
we found the CFF % from a completely uncon- 
strained neural network fit. It is expected that the 
extraction of all four leading twist-two CFFs {H, 
8, T-L and £, or the corresponding GPDs) from 
presently or soon-to-be available data will still 
be an ill-defined optimization problem. Thus, it 
might be necessary to implement in neural net- 
work fits some carefully chosen theoretically ro- 
bust constraints, such as dispersion relations, sum 



rules [ 24 ] and lattice input. 



Utilizing both a simplified toy example and HER- 
MES measurements of photon electroproduction 
asymmetries, we demonstrated that neural net- 
works and Monte Carlo eiTor propagation pro- 
vide a powerful and unbiased tool that extracts 
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